Extending standoff annotation

نویسنده

  • Maik Stührenberg
چکیده

Information encoding is often complex. Textual information is sometimes accompanied by additional encodings (such as visuals). These multimodal documents may be interesting objects of investigation for linguistics. Another class of complex documents are pre-annotated documents. Classic XML inline annotation often fails for both document classes because of overlapping markup. However, standoff annotation, that is the separation of primary data and markup, is a valuable and common mechanism to annotate multiple hierarchies and/or read-only primary data. We demonstrate an extended version of the XStandoff meta markup language, that allows the definition of segments in spatial and pre-annotated primary data. Together with the ability to import already established (linguistic) serialization formats as annotation levels and layers in an XStandoff instance, we are able to annotate a variety of primary data files, including text, audio, still and moving images. Application scenarios that may benefit from using XStandoff are the analyzation of multimodal documents such as instruction manuals, or sports match analysis, or the less destructive cleaning of web pages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient XQuery Support for Stand-Off Annotation

XML annotations are a widely occurring phenomenon in many application fields, and XML databases should be used to store and query such data. To provide intuitive and fast querying of annotations, we make a case for extending XPath with four new axis steps, that correspond with socalled StandOff joins, introduced here. The new steps can be efficiently implemented using a region index and fast lo...

متن کامل

Less Destructive Cleaning of Web Documents by Using Standoff Annotation

Standoff annotation, that is, the separation of primary data and markup, can be an interesting option to annotate web pages since it does not demand the removal of annotations already present in web pages. We will present a standoff serialization that allows for annotating wellformed web pages with multiple annotation layers in a single instance, easing processing and analyzing of the data.

متن کامل

A Standoff Annotation Interface between DELPH-IN Components

We present a standoff annotation framework for the integration of NLP components, currently implemented in the context of the DELPH-IN tools1. This provides a flexible standoff pointer scheme suitable for various types of data, a lattice encodes structural ambiguity, intraannotation relationships are encoded, and annotations are decorated with structured content. We provide an XML serialization...

متن کامل

Iula2Standoff: a tool for creating standoff documents for the IULACT

Due to the increase in the number and depth of analyses required over the text, like entity recognition, POS tagging, syntactic analysis, etc. the annotation in-line has become unpractical. In Natural Language Processing (NLP) some emphasis has been placed in finding an annotation method to solve this problem. A possibility is the standoff annotation. With this annotation style it is possible t...

متن کامل

Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus

The LUNA corpus is a multi-lingual, multidomain spoken dialogue corpus currently under development that will be used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services. The LUNA corpus will be annotated at multiple levels to include annotations of syntactic, semantic, and discourse information; specialized annotation tools will be used for the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014